Search results for " Formal languages"
showing 10 items of 79 documents
Alignment-free sequence comparison using absent words
2018
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…
On Combinatorial Generation of Prefix Normal Words
2014
A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present an efficient algorithm for exhaustively listing the prefix normal words with a fixed length. The algorithm is based on the fact that the language of prefix normal words is a bubble language, a class of binary languages with the property that, for any word w in the language, exchanging the first occurrence of 01 by 10 in w results in another word in the language. We prove that each prefix normal word is produced in O(n) amortized time, and conjecture, based on expe…
Languages with mismatches
2007
AbstractIn this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S an…
Unambiguous recognizable two-dimensional languages
2006
We consider the family UREC of unambiguous recognizable two-dimensional languages. We prove that there are recognizable languages that are inherently ambiguous, that is UREC family is a proper subclass of REC family. The result is obtained by showing a necessary condition for unambiguous recognizable languages. Further UREC family coincides with the class of picture languages defined by unambiguous 2OTA and it strictly contains its deterministic counterpart. Some closure and non-closure properties of UREC are presented. Finally we show that it is undecidable whether a given tiling system is unambiguous.
Forbidden words in symbolic dynamics
2000
AbstractWe introduce an equivalence relation≃between functions from N to N. By describing a symbolic dynamical system in terms of forbidden words, we prove that the≃-equivalence class of the function that counts the minimal forbidden words of a system is a topological invariant of the system. We show that the new invariant is independent from previous ones, but it is not characteristic. In the case of sofic systems, we prove that the≃-equivalence of the corresponding functions is a decidable question. As a more special application, we show, by using the new invariant, that two systems associated to Sturmian words having “different slope” are not conjugate.
On the regularity of circular splicing languages : A survey and new developments
2009
Circular splicing has been introduced to model a specific recombinant behaviour of circular DNA, continuing the investigation initiated with linear splicing. In this paper we focus on the relationship between regular circular languages and languages generated by finite circular splicing systems. We survey the known results towards a characterization of the intersection between these two classes and provide new contributions on the open problem of finding this characterization. First, we exhibit a non-regular circular language generated by a circular simple system thus disproving a known result in this area. Then we give new results related to a restrictive class of circular splicing systems…
On the computational power of affine automata
2017
We investigate the computational power of affine automata (AfAs) introduced in [4]. In particular, we present a simpler proof for how to change the cutpoint for any affine language and a method how to reduce error in bounded error case. Moreover, we address to the question of [4] by showing that any affine language can be recognized by an AfA with certain limitation on the entries of affine states and transition matrices. Lastly, we present the first languages shown to be not recognized by AfAs with bounded-error.
Automata and differentiable words
2011
We exhibit the construction of a deterministic automaton that, given k > 0, recognizes the (regular) language of k-differentiable words. Our approach follows a scheme of Crochemore et al. based on minimal forbidden words. We extend this construction to the case of C\infinity-words, i.e., words differentiable arbitrary many times. We thus obtain an infinite automaton for representing the set of C\infinity-words. We derive a classification of C\infinity-words induced by the structure of the automaton. Then, we introduce a new framework for dealing with \infinity-words, based on a three letter alphabet. This allows us to define a compacted version of the automaton, that we use to prove that ev…
Novel Results on the Number of Runs of the Burrows-Wheeler-Transform
2021
The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as $r$) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the $r$ parameter as measure of repetitiveness, especially to evalua…
On the Number of Closed Factors in a Word
2015
A closed word (a.k.a. periodic-like word or complete first return) is a word whose longest border does not have internal occurrences, or, equivalently, whose longest repeated prefix is not right special. We investigate the structure of closed factors of words. We show that a word of length $n$ contains at least $n+1$ distinct closed factors, and characterize those words having exactly $n+1$ closed factors. Furthermore, we show that a word of length $n$ can contain $\Theta(n^{2})$ many distinct closed factors.